Punjabi University Patiala,India,Website http://www.universitypunjabi.org http://www.advancedcentrepunjabi.org http://www.universitypunjabi.org/sangam/ http://www.advancedcentrepunjabi.org/intro1.asp

Home Page

Project Background


Project beneficiaries

Project Time-Line

Team Members

Project Progress

Starting date of the project : December 2006

Month of the Year Progress


  • A detailed study of the existing Gurmukhi OCR has been made and its limitations and areas of improvement have been noted. The following observations have been made about the present Gurmukhi OCR:

    • Strengths:

      • Font independent. Can recognize most of the commonly used non-decorative Gurmukhi fonts.

      • Can handle skewed documents and touching vowels.

    • Limitations:

      • Susceptible to noise

      • Works well only on clean documents

      • Touching consonants not recognized

      • Does not recognize digits and some special symbols

      • Works only for single column text.

  • Work initiated for development of Corpus for training and testing the OCR.


  • Twenty five books representing different fonts, time periods, publishers and print quality identified for development of Corpus. Around 1000 pages scanned for the corpus.

  • Segmentation algorithms for overlapping text lines and merged characters being developed.